Gene finding: putting the parts together

نویسنده

  • Anders Krogh
چکیده

Any isolated signal of a gene is hard to predict. Current methods for promoter prediction, for instance, will have either a very low specificity or a very bad sensitivity, such that they will either predict a huge number of false positives (fake promoters) or a very small number of true promoters. The same is essentially true for splice site prediction: if looked at in isolation, splice sites are very hard to recognize with good accuracy. This may seem like a contradiction, because there are programs that perform well on this task such as those by Brunak, Engelbrecht, & Knudsen (1991) and Solovyev, Salamov, & Lawrence (1994). The reason for the success is that both of these methods also use the statistics of the coding exon region next to the splice site. Apart from doing a very careful job of describing the regions right around the splice site, they can therefore also rule out splice sites which do not sit next to something looking like a good coding region. In bird-watching the surroundings often gives the necessary clues in deciding which bird you are watching in the distance, whether it is seen in an open field or in a wood for instance. Signal detection in genes is much like bird-watching: it is necessary to take the surroundings into account. Therefore, to predict something like a splice site, you also need to predict coding exons and vice versa (disregarding the splice sites of introns in untranslated regions). In a long DNA sequence, you probably would not expect to see a coding exon with two associated splice sites unless there are other exons with which it can combine. In this way predictions of the various parts of a gene should influence each other, and prediction of the entire gene structure will also improve on the predictions of the individual signals. Therefore, in the last few years, gene prediction has moved more and more towards prediction of whole gene structures, and these methods typically use modules for recognition of coding regions, splice sites, translation initiation and termination sites, and some even use statistics of the 5’ and 3’ untranslated regions (UTRs), promoters, etc.. This combination of predictions has indeed improved the accuracy of gene prediction considerably, and as more knowledge is gained about transcription and translation, it is likely that the integration of other signals can improve it even further.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Polynomial-time Algorithm to Design Push Plans for Sensorless Parts Sorting

We consider the efficient computation of sequences of push actions that simultaneously orient two different polygons. Our motivation for studying this problem comes from the observation that appropriately oriented parts admit simple sensorless sorting. We study the sorting of two polygonal parts by first putting them in properly selected orientations. We give an O(n log n)-time algorithm to enu...

متن کامل

To a Mathematical Definition of “ Life ” Acm

“Life” and its “evolution” are fundamental concepts that have not yet been formulated in precise mathematical terms, although some efforts in this direction have been made. We suggest a possible point of departure for a mathematical definition of “life.” This definition is based on the computer and is closely related to recent analyses of “inductive inference” and “randomness.” A living being i...

متن کامل

Putting the Pieces Together: Regularized Multi-part Shape Matching

Multi-part shape matching is an important class of problems, arising in many fields such as computational archaeology, biology, geometry processing, computer graphics and vision. In this paper, we address the problem of simultaneous matching and segmentation of multiple shapes. We assume to be given a reference shape and multiple parts partially matching the reference. Each of these parts can h...

متن کامل

To a Mathematical Definition of \life"

\Life" and its \evolution" are fundamental concepts that have not yet been formulated in precise mathematical terms, although some eeorts in this direction have been made. We suggest a possible point of departure for a mathematical deenition of \life." This deenition is based on the computer and is closely related to recent analyses of \inductive inference" and \randomness." A living being is a...

متن کامل

Securing Images Online: A Protection Mechanism That Does Not Involve Watermarking

The paper covers a method of allowing a client to browse an image to examine it in detail, while making it difficult to steal. It differs from invisible watermarking methods in that it attempts to prevent theft, rather than detect or verify theft after it has happened. The image is served to the client in parts. The parts are imperceptibly altered. No effort is made to protect individual parts;...

متن کامل

شناسایی نوع و مدل وسیله نقلیه با استفاده از مجموعه بخش‌های متمایز‌کننده

In fine-grained recognition, the main category of object is well known and the goal is to determine the subcategory or fine-grained category. Vehicle make and model recognition (VMMR) is a fine-grained classification problem. It includes several challenges like the large number of classes, substantial inner-class and small inter-class distance. VMMR can be utilized when license plate numbers ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998